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Amendments to the Claims 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 

1 . (Currently Amended) A system that facilitates enhancement of a speech signal, 
comprising: 

an input component that receives a speech signal and pixel-based image data 
relating to an originator of the speech signal; and[[,]] 

a speech enhancement component that employs a probabilistic - based model that is 
configured to infer correlates] ]ions between the speech signal and the pixel-based 
image data so as to facilitate discrimination of noise from the speech signal, the by 
employing a probabilistic-based model comprising a video embedded subspacc model 
fused with an audio mixture model such that employing a sot of hidden variable[[s]] that 
represents the pixel-based image data in lower dimensions depends on a state variable of 
representing relevant features, the features being inferred from at least one of the speech 

2. (Currently Amended) The system of claim 1, the probabilistic - based model comprising 
an audio model, wherein the audio model js based[[,]] at least in part[[,]] upon: 

k 

p(s) = 7l s 

p(w\u) = YlN(w k \hu k ,to 

k 

where u k is a clean speech signal, 
wu is the speech signal, 
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s is [[a]]the state variable of the speech signal, and[[,]] 

the notation N(x\ ju, a) denotes a Gaussian distribution over random variable 

x with mean // and inverse covariance a. 

3. (Currently Amended) The system of claim 1, the probabilistic based model comprising 
a video model wherein the video model is based[[,]] at least in part[[,]] upon: 

p(l) = const. 

P (v\r) = n^-iz^o+A^.) 

J 

p(y\v,l) = YlN(y t \vu,A) 
where y is the pixel-based image data, 

r is [[a]]thc hidden variable that represents the pixel-based imaac data in lower 
dimensions . 

A is a matrix of weights for the hidden variable[[s]] r, 

I is a location parameter, 

v is a hidden clean pixel-based image, 

vu is shorthand for (x { -x 'i), 

x(i) is the position of the z -th pixel, 

xi is the position represented by /, and[[,]] 

£;(x) is the index of v corresponding to 2D position x. 

4. (Currently Amended) The system of claim 1, wherein the probabilistic-based model 
comprising an audio/video model, the audio/video model is based[[,]] at least in part[[,]] 
upon: 

p(r\s) = Yl N ( r j\%^j) 
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where r is [[a]]the hidden variable that represents the pixel-based image data in lower 
dimensions , 

s is [[a]]the state variable of the speech signal, 

^is a precision matrix parameter associated with s, and[[,]] 

77 is a precision matrix parameter associated with s. 

5. (Currently Amended) The system of claim 1, wherein the speech enhancement 
component is configured to infer the correlations between the speech signal and the pixel- 

based upon a variational expectation maximization algorithm having an E-step and an M- 
step. 



6. (Currently Amended) The system of claim 5, wherein the variational expectation 
maximization algorithm being is based[[,]] at least in part[[,]] on the equation : 



p(u,s,r,v\y,w) * q{u\s)q{s)q{r\s)q{v\r,l)q{l) 



where u is a clean speech signal, 

s is [[a]]the state variable of the speech signal, 

r is [[a]]the hidden variable that represents the pixel-based image data in lower 
dimensions , 

v is a hidden clean pixel-based image, 
y is the pixel-based image, 
w is the speech signal, and[[,]] 
/ is a location parameter. 

7. (Currently Amended) The system of claim 5, wherein the expectation maximization 
algorithm being is_based[[,]] at least in part[[,]] on the equation : 
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j- = (\w k \ 2 )-2hR e (w k Eu k ) + (E\u k \ 2 ) 




and[[J] 

Uk is a clean speech signal, 

Wk is the speech signal, 

n s is a prior probability parameter of s, and 

o S k is an inverse covariancerandr. 

8. (Currently Amended) The system of claim 7, wherein the expectation maximization 
algorithm being is further based[[,]] at least in part[[,]] on the equation : 

A = (Evr T - EvEr T ) (Err T - ErEr T Y 
/j = (Ev - AEr] 

v' 1 = Diag{Evv T - AErv T - juEv T ) 
where "Diag" refers to [[the]]a diagonal of the matrix, and[[,]] 
Er = Yj W sV s 
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Evr T = X^JZ^+^+IyT 1 ] 

Evv T = ^7r s rf s + Ji\A rf s +JlJ +A y/~ l A T + v 



9. (Currently Amended) The system of claim 8, wherein the expectation maximization 
algorithm being is further based[[,]] at least in part[[,]] on the equation : 



10. (Currently Amended) The system of claim 1, wherein the pixel-based image data 
compris[[ing]]es information associated with an appearance of the lips of the originator 
of the speech signal. 

11. (Currently Amended) The system of claim 1, wherein the speech enhancement 
component that is configured to infer correlations between the speech signal and the 
pixel-based image data comprises a speech component that is configured to track[[s]] 
[[the]] lips of the originator of the speech signal in order to facilitate discrimination of 
noise from the speech signal. 

12. (Currently Amended) The system of claim 1 , wherein the input component further 
compris[[ing]]es a frequency transformation component that is configured to receivers]] 
windowed signal inputs, computers]] a frequency transform of the windowed signal[[s]] 
inputs , and providers]] outputs of the frequency transformed windowed signal[[s]] inputs 
to the speech enhancement component. 
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13. (Currently Amended) The system of claim 12, further comprising a windowing 
component that is configured to appl[[ies]]y an N-point window to the speech signal and 
providers]] [[the]] windowed signal inputs to the frequency transformation component. 

14. (Currently Amended) The system of claim 1, further comprising at least two audio 
input devices that is configured to provide speech signals. 

15. (Currently Amended) The system of claim 1, wherein the probabilistic-based model is 
configured to be[[ing]] trained[[,]] at least in part[[,]] during operation of the system. 

16-17. (Canceled). 

18. (Currently Amended) A method of facilitating enhancement of a speech signal, 
comprising: 

receiving a speech signal; 

receiving [[a]] pixel-based image data relating to an originator of the speech 
signal; aadr 

inferring correlations between the speech signal and the pixel-based image data 
using a probabilistic-based model comprising a video embedded subspace model fused 
with an audio mixture model such that a hidden variable that represents the pixel-based 
image data in lower dimensions depends on a state variable of the speech signal; and 

generating an enhanced speech signal based[[,]] at least in part[[,]] upon a 
probabilistic based model that the correlat[[es]]igns between the speech signal and the 
pixel-based image data so as to facilitate discrimination of noise from the speech signal . 

19. (Original) The method of claim 18 further comprising providing an output associated 
with the enhanced speech signal. 

20. (Currently Amended) A data packet configured to be transmitted between two or 
more computer components that are configured to facilitate[[s]] enhancement of a speech 
signal, the data packet comprising: 
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an enhanced speech signal , the enhanced speech signal being based, generated at 
least in part[[,]] upon utilizing a probabilistic-based model that is configured to infer 
correlat[[es]]ions between a speech signal and image data related to an originator of the 
speech signal , the probabilistic-based model comprising a video embedded subspace 
model fused with an audio mixture model such that a hidden variable that represents the 
image data in lower dimensions depends on a state variable of the speech signals e-as-te 
facilitate discrimination of noise from the speech signal . 

21. (Currently Amended) A computer readable medium storing computer executable 
components of a system that facilitates enhancement of a speech signal comprising , the 
computer executable components comprising: 

an input component that configured to receivers]] a speech signal and pixel - 
based image data relating to an originator of the speech signal; and[[,]] 

an speech enhancement component that configured to employes]] a probabilistic- 
based model that is configured to corrclate[[s]] between the speech signal and the image 
data , the probabilistic-based model comprising a video embedded subspace model fused 
with an audio mixture model such that a hidden variable that represents the image data in 
lower dimensions depends on a state variable of the speech signal so as to facilitate 
discrimination of noise from the speech signal . 

22. (Currently Amended) A system that facilitates enhancement of a speech signal 
comprising: 

means for receiving a speech signal and pixel based image data relating to an 
originator of the speech signal; and[[,]] 

means for enhancing the speech signal, the means for enhancing configured to 
employ[[ing]] a probabilistic-based model that is configured to correlate[[s]] between the 
speech signal and the image data , the probabilistic-based model comprising a video 
embedded subspace model fused with an audio mixture model such that a hidden variable 
that represents the image data in lower dimensions depends on a state variable of the 
speech signal so as to facilitate discrimination of noiso from the speech signal . 
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